Fix edge cases for already exists resources #5239

shubham-pampattiwar · 2022-08-22T13:56:31Z

Signed-off-by: Shubham Pampattiwar [email protected]

Thank you for contributing to Velero!

Does your change fix a particular issue?

Fixes #5223

Please indicate you've done the following:

Accepted the DCO. Commits without the DCO will delay acceptance.
Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
Updated the corresponding documentation in site/content/docs/main.

codecov-commenter · 2022-08-22T15:20:04Z

Codecov Report

Merging #5239 (d71f726) into main (082d680) will decrease coverage by 0.03%.
The diff coverage is 38.09%.

@@            Coverage Diff             @@
##             main    #5239      +/-   ##
==========================================
- Coverage   41.43%   41.39%   -0.04%     
==========================================
  Files         231      231              
  Lines       19666    19679      +13     
==========================================
- Hits         8149     8147       -2     
- Misses      10923    10936      +13     
- Partials      594      596       +2

Impacted Files	Coverage Δ
pkg/restore/restore.go	`63.89% <38.09%> (-0.74%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

kaovilai · 2022-08-24T00:42:01Z

I have an alternative approach to this PR at openshift#188 fyi which put most of the logic behind isAlreadyExistsError function instead.

reasonerjt · 2022-08-29T02:29:06Z

pkg/restore/restore.go

+		if fromCluster == nil {
+			fromCluster, err = resourceClient.Get(name, metav1.GetOptions{})
+			if err != nil {
+				ctx.log.Infof("Error retrieving cluster version of %s: %v", kube.NamespaceAndName(obj), err)


I know this is copied from existing code, but seems it's more consistent (e.g with the code in line 1247 ) if we merge this err into errors.
We may also wanna change the log level.

@reasonerjt Hmm. I'm not sure, but yeah, you may be right. So this would be a change in behavior for situations not otherwise modified by this PR. So the scenario we're talking about is that the Create failed due to the object already existing, but now when we do a Get call (to determine whether the in-cluster object is different from the from-backup object), the Get call fails. Ordinarily this shouldn't happen. Off the top of my head, possible causes are:

Something outside of Velero just created this resource at almost the same time as Velero tried to restore and we're in a race scenario where a second Create fails due to an already existing object but the API server isn't yet finding it via Get.

A bug or malfunction in the APIServer

We're currently flagging this as a restore warning. If we want to stick with that, then we should probably change the log level from Info to Warn.

The other possibility is we bump this up to a restore error and the log level to Error as well.

@shubham-pampattiwar what do you think?

@reasonerjt @sseago I agree with changing the log level to error and adding this error to errors (as the next piece of workflow depends on the object fetched from the cluster)

Done, updated the PR.

Signed-off-by: Shubham Pampattiwar <[email protected]> add changelog file Signed-off-by: Shubham Pampattiwar <[email protected]> update changelog filename Signed-off-by: Shubham Pampattiwar <[email protected]> change log level and error type Signed-off-by: Shubham Pampattiwar <[email protected]>

github-actions bot requested review from dsu-igeek and sseago August 22, 2022 13:56

github-actions bot assigned shubham-pampattiwar Aug 22, 2022

github-actions bot added the has-changelog label Aug 22, 2022

shubham-pampattiwar mentioned this pull request Aug 22, 2022

Fix Edge cases for already exists resources #5223

Closed

shubham-pampattiwar force-pushed the fix-already-exists branch from e08fa50 to d71f726 Compare August 22, 2022 15:07

shubham-pampattiwar requested review from reasonerjt and blackpiglet August 23, 2022 14:46

reasonerjt requested review from Lyndon-Li and removed request for dsu-igeek August 26, 2022 08:50

reasonerjt reviewed Aug 29, 2022

View reviewed changes

shubham-pampattiwar force-pushed the fix-already-exists branch from d71f726 to 93a8758 Compare August 29, 2022 21:34

shubham-pampattiwar requested a review from reasonerjt August 29, 2022 21:35

sseago approved these changes Aug 30, 2022

View reviewed changes

reasonerjt approved these changes Aug 31, 2022

View reviewed changes

reasonerjt merged commit 218bab9 into vmware-tanzu:main Aug 31, 2022

shubham-pampattiwar mentioned this pull request Aug 31, 2022

Cherrypick PRs for v1.9.2 #5266

Closed

This was referenced Sep 6, 2022

Velero restore partially failed as default priorityClass #5288

Closed

skip forbidden error for restore priorityClass #5292

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix edge cases for already exists resources #5239

Fix edge cases for already exists resources #5239

shubham-pampattiwar commented Aug 22, 2022

codecov-commenter commented Aug 22, 2022

kaovilai commented Aug 24, 2022 •

edited

Loading

reasonerjt Aug 29, 2022

sseago Aug 29, 2022

shubham-pampattiwar Aug 29, 2022

shubham-pampattiwar Aug 29, 2022

Fix edge cases for already exists resources #5239

Fix edge cases for already exists resources #5239

Conversation

shubham-pampattiwar commented Aug 22, 2022

Does your change fix a particular issue?

Please indicate you've done the following:

codecov-commenter commented Aug 22, 2022

Codecov Report

kaovilai commented Aug 24, 2022 • edited Loading

reasonerjt Aug 29, 2022

Choose a reason for hiding this comment

sseago Aug 29, 2022

Choose a reason for hiding this comment

shubham-pampattiwar Aug 29, 2022

Choose a reason for hiding this comment

shubham-pampattiwar Aug 29, 2022

Choose a reason for hiding this comment

kaovilai commented Aug 24, 2022 •

edited

Loading